Skip to content

fix: force-refresh dev release data before triggering agent update#56

Merged
TerrifiedBug merged 3 commits intomainfrom
fix/dev-agent-update-race-condition
Mar 7, 2026
Merged

fix: force-refresh dev release data before triggering agent update#56
TerrifiedBug merged 3 commits intomainfrom
fix/dev-agent-update-race-condition

Conversation

@TerrifiedBug
Copy link
Copy Markdown
Owner

Summary

  • Fixes TOCTOU race condition causing infinite update loop on dev agents
  • When triggering a dev agent update, the server now force-refreshes the dev release info from GitHub before setting the pendingAction, ensuring the checksum always matches the binary at the download URL

Root cause

The download URL releases/download/dev/vf-agent-linux-amd64 is a rolling pointer — every push to main rebuilds the dev release with a new binary. When the fleet UI caches version/checksum data and a new build lands before the user clicks Update, the agent downloads the new binary but the cached checksum doesn't match, causing a silent update failure loop:

  1. Version check caches dev-bbcb3a7 with checksum 325f58ed...
  2. New commit pushes → CI rebuilds dev release as dev-281bf67 with checksum 9f0d8898...
  3. User clicks Update → sends stale checksum 325f58ed...
  4. Agent downloads 281bf67 binary → checksum mismatch → fails
  5. pendingAction cleared → "update available" shows again → loop

Test plan

  • Trigger a dev agent update when the dev release has been recently rebuilt
  • Verify the server-side force-refresh corrects the version/checksum before setting pendingAction
  • Confirm agent successfully downloads, verifies, and applies the update

Rolling dev releases replace the binary at the download URL on every
push to main. When the UI caches version/checksum data and a new build
lands before the user clicks Update, the agent downloads the new binary
but the checksum from the stale cache doesn't match — causing a silent
update failure loop.

Force-refresh dev release info server-side in triggerAgentUpdate so the
checksum always matches the binary currently at the download URL.
@github-actions github-actions bot added the fix label Mar 7, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 7, 2026

Greptile Summary

This PR fixes a TOCTOU race condition in the triggerAgentUpdate mutation by force-refreshing dev release metadata from GitHub immediately before writing pendingAction. The dev release URL is a rolling pointer — new CI builds can replace the binary between when the fleet UI caches the version/checksum and when the user clicks Update.

Key improvements:

  • checkDevAgentVersion(true) (force-refresh) is called whenever targetVersion starts with dev-
  • If the fresh version differs from the cached one and a fresh checksum is available, both targetVersion and checksum are replaced before the DB write
  • If the fresh version differs but no checksum can be retrieved, an INTERNAL_SERVER_ERROR is thrown immediately, preventing silent fallback to a known-stale checksum
  • When the freshly-fetched version matches what the UI sent, no substitution is needed

The fix correctly handles the primary TOCTOU bug and the fallback-to-stale-checksum edge case that would perpetuate the infinite loop.

Confidence Score: 4/5

  • Safe to merge — the TOCTOU race condition is correctly addressed with proper force-refresh and error handling.
  • The fix is logically sound and correctly handles the race condition. The primary concern (stale checksum after binary replacement) is prevented by force-refreshing dev release metadata before DB write. The edge case where the fresh version differs but no checksum is available is properly handled with an explicit error throw rather than silent fallback. The only residual scenario is when fresh.latestVersion is null (GitHub entirely unreachable), in which case the code falls back to the DB-cached version; if that cached version equals the client-supplied targetVersion, the original checksum is used unchanged. This degraded-mode behavior is acceptable under truly exceptional circumstances (GitHub down) and does not constitute a correctness bug under normal operation.
  • No files require special attention.

Last reviewed commit: 35d9d30

When the version has changed but the checksum can't be retrieved,
fail fast instead of silently proceeding with a known-stale checksum
that would cause the agent to hit a checksum mismatch on download.
@TerrifiedBug
Copy link
Copy Markdown
Owner Author

@greptile review

@TerrifiedBug TerrifiedBug merged commit 367517e into main Mar 7, 2026
10 checks passed
@TerrifiedBug TerrifiedBug deleted the fix/dev-agent-update-race-condition branch March 7, 2026 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant